Shahrukh Mallick

CS 6670: Computer Vision

Project 1: Feature Detection and Matching

My Own Feature Descriptor (pseudo-SIFT implementation)

Description

For my own feature, I decided to try and implement my own crude version of SIFT. Here�s how I implemented it. I used the points generated by the Harris Feature Detector as my points of interest. You also need to generate the gradient magnitude of each individual pixel. This involves similar procedures as done when calculating the Harris values, but an additional step to find the magnitude of the gradient values.

Next, you iterate over all points of interest and generate a 16x16 window around it, and break that window into 16 4x4 windows. For each smaller window, generate a radian histogram (which has 8 components). However, you ignore pixels whose gradient magnitude is below some threshold (tuning this is difficult). Once you�ve generated a histogram for each of the 16 windows, you can create a 128 object descriptor to be used for this feature. Once you�ve done this, you need to normalize each descriptor�s values. If you want more exact details, there�s plenty of documentation online or in our class notes about SIFT.

Reason for major design choices

There weren�t too many design choices of my own, since I was following the slides on SIFT implementation. I experimentally determined a threshold value for the gradient magnitude by testing out several values below the mean magnitude value of the image (In retrospect, it probably would�ve been wiser to determine the median magnitude value as a start point).

Performance

Provided below are several charts and tables on the performance of the three different descriptor types.

Figure 1 below shows the ROC curves of the three descriptors (+ SIFT) on the Yosemite pictures. As a reference SIFT is included. All three descriptors perform well (>87% AUC for all cases), with my own descriptor performing the highest (~95%) among the three descriptors. MOPs was second best, with the simple window descriptor coming in last. In all cases, the ratio test improved the AUC results.

Figure 1: Yosemite Roc Plot

AUC Values for Yosemite

Simple Window + SSD:� 0.889366

Simple Window + Ratio: 0.929238

MOPs + SSD: 0.878177

MOPs + Ratio: 0.949780

MyOwn + SSD: 0.928682

MyOwn + Ratio: 0.951060

Figure 2 shows the ROC curves on the graf pictures. As expected, the descriptors all performed more poorly on these images than on the Yosemite images. In this case, MOPs outperformed my own descriptor by a fairly large margin (~+10% improvement). The simple descriptor did not perform that well here either, but that�s to be expected, since graf changes the angle, and simple descriptor only handles translations.

Figure 2: Graf Roc Plot

AUC Values for Graf

Simple Window + SSD: 0.571100

Simple Window + Ratio: 0.711916

MOPs + SSD: 0.765770

MOPs + Ratio: 0.860932

MyOwn + SSD: 0.662242

MyOwn + Ratio: 0.716466

Figure 3 shows an example plot of the threshold values on the MOPs descriptor on the two different images. Both images indicate there�s a good threshold value to use for matching. Similar plots can be made to determine optimal thresholds for the other two descriptors.

Figure 3: Threshold plot for Yosemite (left) and graf (right)

Harris Operator Results (as requested)

Harris Operator Results on Yosemite1.jpg

Harris Operator Results on graf image (img1.ppm)

Benchmark Results

SW = Simple Window descriptor

MY = My own descriptor

MOPS = Mops Descriptor

	Bike		Graf		Leuven		Wall
	Avg Error	Avg AUC	Avg Error	Avg AUC	Avg Error	Avg AUC	Avg Error	Avg AUC
SW + SSD	525	61.25%	270	50.43%	401	30.29%	336	47.76%
SW + ratio	525	62.15%	270	50.19%	401	48.35%	336	55.84%
MOPS + SSD	529	52.70%	294	51.77%	310	68.04%	361	63.19%
MOPS + ratio	529	57.30%	294	59.15%	310	68.09%	361	62.73%
MY + SSD	491	49.31%	264	53.01%	302	55.59%	303	54.52%
MY + ratio	491	55.04%	264	53.31%	302	60.60%	303	58.63%

Table 1: All error and AUC values averaged over images in that directory. Error is in units of pixels and AUC is area under curve.

Over all, the descriptors performed decently, but there�s still plenty of room for improvement. In almost all cases, the ratio test improved results, confirming it is the better metric to use.

The simple window�s strengths and weaknesses are most obvious. Since it only measures translational changes, it struggles on images where angles change. Hence, it performed best on bikes where the images don�t shift, but only blur. It struggled on wall and graf because of the change in the angle of the picture taken. A surprising result is how poorly it performed on the Leuven. This can partially be explained by the threshold during Harris feature selection. The threshold will discriminate greatly on the various luminous pictures, and the effect is very apparent on the results. A possible way to increase this would be to tune the threshold to see which works best for the simple window descriptor.

MOPs performed the best among the three images, and specifically performed best on the Leuven image set. This is likely because luminosity doesn�t affect MOPs as much since MOPs measures the gradient, which doesn�t change across the images. Normalizing the intensities in the MOPs descriptor likely helped in dealing directly with what the Leuven image set was testing against.� MOPs also did well on the wall image set, likely highlighting that its measure radian angle helped in describing the features, since angle was the main thing being tested here.� Compared to its performance on the previous two image sets, MOPs did not do that well on the bikes or graf image sets. Graf is a very difficult image set as the image warps a lot, and so there�s lots of room for mistakes. Poor performance on the bike is likely explained since blurring causes some loss of detail, and therefore, it�s harder for the MOPs descriptor to get strong gradients.

Lastly, my own descriptor did not do as well as I had hoped. Since it was modeled off of SIFT, I expected it to be robust, but it seemed to have similar trouble to the bikes and graf image sets as the MOPs descriptor did. However, it performed well on the Leuven and wall image sets for similar reasons as the MOPs.

Extra Credit

I was hoping my attempt at implementing SIFT would warrant some extra credit, but the results don�t seem to indicate the implementation was that successful. Points for effort?